首页> 外文OA文献 >Accelerating Large Scale Centroid-based Clustering with Locality Sensitive Hashing
【2h】

Accelerating Large Scale Centroid-based Clustering with Locality Sensitive Hashing

机译:通过局部敏感哈希加速大规模基于质心的聚类

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Most traditional data mining algorithms struggle to cope with the sheer scale of data efficiently. In this paper, we propose a general framework to accelerate existing clustering algorithms to cluster large-scale datasets which contain large numbers of attributes, items, and clusters. Our framework makes use of locality sensitive hashing (LSH) to significantly reduce the cluster search space. We also theoretically prove that our framework has a guaranteed error bound in terms of the clustering quality. This framework can be applied to a set of centroid-based clustering algorithms that assign an object to the most similar cluster, and we adopt the popular K-Modes categorical clustering algorithm to present how the framework can be applied. We validated our framework with five synthetic datasets and a real world Yahoo! Answers dataset. The experimental results demonstrate that our framework is able to speed up the existing clustering algorithm between factors of 2 and 6, while maintaining comparable cluster purity.
机译:大多数传统的数据挖掘算法都难以有效地应对庞大的数据规模。在本文中,我们提出了一个通用框架来加速现有的聚类算法,以聚类包含大量属性,项目和聚类的大规模数据集。我们的框架利用局部敏感哈希(LSH)来大大减少集群搜索空间。我们还从理论上证明,就聚类质量而言,我们的框架具有有保证的错误范围。该框架可以应用于将对象分配给最相似的群集的一组基于质心的聚类算法,并且我们采用流行的K-Modes分类聚类算法来展示如何应用该框架。我们使用五个综合数据集和真实世界的Yahoo!验证了我们的框架。答案数据集。实验结果表明,我们的框架能够在因子2和6之间加速现有的聚类算法,同时保持相当的簇纯度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号